蒙面自动编码是一种流行而有效的自我监督学习方法,可以指向云学习。但是,大多数现有方法仅重建掩盖点并忽略本地几何信息,这对于了解点云数据也很重要。在这项工作中,据我们所知,我们首次尝试将局部几何信息明确考虑到掩盖的自动编码中,并提出一种新颖的蒙版表面预测(Masksurf)方法。具体而言,考虑到以高比例掩盖的输入点云,我们学习一个基于变压器的编码器码头网络,通过同时预测表面位置(即点)和每条效率方向(即,正常),以估算基础掩盖的表面。 。点和正态的预测由倒角距离和新引入的位置指标的正常距离以设定的方式进行监督。在三种微调策略下,我们的Masksurf在六个下游任务上得到了验证。特别是,MaskSurf在OBJ-BG设置下的ScanoBjectNN的现实世界数据集上胜过其最接近的竞争对手Point-Mae,证明了掩盖的表面预测的优势比蒙版的预测优势比蒙版的预测。代码将在https://github.com/ybzh/masksurf上找到。
translated by 谷歌翻译
Inferring missing links or detecting spurious ones based on observed graphs, known as link prediction, is a long-standing challenge in graph data analysis. With the recent advances in deep learning, graph neural networks have been used for link prediction and have achieved state-of-the-art performance. Nevertheless, existing methods developed for this purpose are typically discriminative, computing features of local subgraphs around two neighboring nodes and predicting potential links between them from the perspective of subgraph classification. In this formalism, the selection of enclosing subgraphs and heuristic structural features for subgraph classification significantly affects the performance of the methods. To overcome this limitation, this paper proposes a novel and radically different link prediction algorithm based on the network reconstruction theory, called GraphLP. Instead of sampling positive and negative links and heuristically computing the features of their enclosing subgraphs, GraphLP utilizes the feature learning ability of deep-learning models to automatically extract the structural patterns of graphs for link prediction under the assumption that real-world graphs are not locally isolated. Moreover, GraphLP explores high-order connectivity patterns to utilize the hierarchical organizational structures of graphs for link prediction. Our experimental results on all common benchmark datasets from different applications demonstrate that the proposed method consistently outperforms other state-of-the-art methods. Unlike the discriminative neural network models used for link prediction, GraphLP is generative, which provides a new paradigm for neural-network-based link prediction.
translated by 谷歌翻译
This paper focuses on the prevalent performance imbalance in the stages of incremental learning. To avoid obvious stage learning bottlenecks, we propose a brand-new stage-isolation based incremental learning framework, which leverages a series of stage-isolated classifiers to perform the learning task of each stage without the interference of others. To be concrete, to aggregate multiple stage classifiers as a uniform one impartially, we first introduce a temperature-controlled energy metric for indicating the confidence score levels of the stage classifiers. We then propose an anchor-based energy self-normalization strategy to ensure the stage classifiers work at the same energy level. Finally, we design a voting-based inference augmentation strategy for robust inference. The proposed method is rehearsal free and can work for almost all continual learning scenarios. We evaluate the proposed method on four large benchmarks. Extensive results demonstrate the superiority of the proposed method in setting up new state-of-the-art overall performance. \emph{Code is available at} \url{https://github.com/iamwangyabin/ESN}.
translated by 谷歌翻译
最新的深层神经网络仍在努力解决持续学习中的灾难性遗忘问题。在本文中,我们提出了一种简单的范式(称为S宣传)和两种具体方法,以高度降低最典型的连续学习场景之一,即域增量学习(DIL)。范式的关键思想是通过预先训练的变压器独立学习提示,以避免使用常规方法中通常出现的示例。这导致了双赢游戏,提示可以为每个域获得最佳状态。跨域的独立提示仅请求一个单一的跨凝结损失,以进行训练,而一个简单的K-NN操作作为推理的域标识符。学习范式得出了图像及时的学习方法和全新的语言图像及时学习方法。拥有出色的可伸缩性(每个域的参数增加0.03%),我们最好的方法在三个标准的最先进的无典范方法上实现了显着的相对改进(平均约30%)当他们使用示例时,DIL任务甚至相对超过了他们的最好的任务。
translated by 谷歌翻译
配对点云之间的低空区域使被捕获的特征非常自信,导致尖端模型以质量较差的云登记。除了传统的智慧之外,我们还提出了一个有趣的问题:是否有可能在两个低重叠点云之间利用中间却又错位的图像来增强尖端注册模型的性能?为了回答它,我们提出了一个被称为Imlovenet的低重叠点云对的未对准图像支持的注册网络。 Imlovenet首先学习跨不同模态的三重深特征,然后将这些特征导出到两个阶段分类器中,以逐步获得两个点云之间的高信心重叠区域。因此,软对应关系在预测的重叠区域中得到了很好的确定,从而导致了准确的刚性转换。 Imlovenet易于实现,但有效,因为1)未对准的图像为两个低重叠点云提供了更清晰的重叠信息,以更好地定位重叠零件; 2)它包含某些几何知识,以提取更好的深度特征; 3)它不需要成像设备的外部参数,相对于3D点云的参考框架。对各种基准的广泛定性和定量评估证明了我们的iMlovenet比最新方法的有效性和优越性。
translated by 谷歌翻译
本文旨在解决一次性对象计数的具有挑战性的任务。鉴于包含新颖的图像,以前看不见的类别对象的图像,任务的目标是仅使用一个支持边界框示例计算所需类别中的所有实例。为此,我们提出了一个计数模型,您只需要查看一个实例(LAONET)。首先,特征相关模块结合了自我关注和相关的模块来学习内部关系和关系。它使得网络能够在不同的情况下对旋转和尺寸的不一致具有稳健性。其次,刻度聚合机制旨在帮助提取具有不同比例信息的特征。与现有的几次计数方法相比,LaOnet在以高收敛速度学习时达到最先进的结果。代码即将推出。
translated by 谷歌翻译
Masked image modeling (MIM) performs strongly in pre-training large vision Transformers (ViTs). However, small models that are critical for real-world applications cannot or only marginally benefit from this pre-training approach. In this paper, we explore distillation techniques to transfer the success of large MIM-based pre-trained models to smaller ones. We systematically study different options in the distillation framework, including distilling targets, losses, input, network regularization, sequential distillation, etc, revealing that: 1) Distilling token relations is more effective than CLS token- and feature-based distillation; 2) An intermediate layer of the teacher network as target perform better than that using the last layer when the depth of the student mismatches that of the teacher; 3) Weak regularization is preferred; etc. With these findings, we achieve significant fine-tuning accuracy improvements over the scratch MIM pre-training on ImageNet-1K classification, using all the ViT-Tiny, ViT-Small, and ViT-base models, with +4.2%/+2.4%/+1.4% gains, respectively. Our TinyMIM model of base size achieves 52.2 mIoU in AE20K semantic segmentation, which is +4.1 higher than the MAE baseline. Our TinyMIM model of tiny size achieves 79.6% top-1 accuracy on ImageNet-1K image classification, which sets a new record for small vision models of the same size and computation budget. This strong performance suggests an alternative way for developing small vision Transformer models, that is, by exploring better training methods rather than introducing inductive biases into architectures as in most previous works. Code is available at https://github.com/OliverRensu/TinyMIM.
translated by 谷歌翻译
In this paper, we propose a robust 3D detector, named Cross Modal Transformer (CMT), for end-to-end 3D multi-modal detection. Without explicit view transformation, CMT takes the image and point clouds tokens as inputs and directly outputs accurate 3D bounding boxes. The spatial alignment of multi-modal tokens is performed implicitly, by encoding the 3D points into multi-modal features. The core design of CMT is quite simple while its performance is impressive. CMT obtains 73.0% NDS on nuScenes benchmark. Moreover, CMT has a strong robustness even if the LiDAR is missing. Code will be released at https://github.com/junjie18/CMT.
translated by 谷歌翻译
Dataset distillation has emerged as a prominent technique to improve data efficiency when training machine learning models. It encapsulates the knowledge from a large dataset into a smaller synthetic dataset. A model trained on this smaller distilled dataset can attain comparable performance to a model trained on the original training dataset. However, the existing dataset distillation techniques mainly aim at achieving the best trade-off between resource usage efficiency and model utility. The security risks stemming from them have not been explored. This study performs the first backdoor attack against the models trained on the data distilled by dataset distillation models in the image domain. Concretely, we inject triggers into the synthetic data during the distillation procedure rather than during the model training stage, where all previous attacks are performed. We propose two types of backdoor attacks, namely NAIVEATTACK and DOORPING. NAIVEATTACK simply adds triggers to the raw data at the initial distillation phase, while DOORPING iteratively updates the triggers during the entire distillation procedure. We conduct extensive evaluations on multiple datasets, architectures, and dataset distillation techniques. Empirical evaluation shows that NAIVEATTACK achieves decent attack success rate (ASR) scores in some cases, while DOORPING reaches higher ASR scores (close to 1.0) in all cases. Furthermore, we conduct a comprehensive ablation study to analyze the factors that may affect the attack performance. Finally, we evaluate multiple defense mechanisms against our backdoor attacks and show that our attacks can practically circumvent these defense mechanisms.
translated by 谷歌翻译
Blind image quality assessment (BIQA) remains challenging due to the diversity of distortion and image content variation, which complicate the distortion patterns crossing different scales and aggravate the difficulty of the regression problem for BIQA. However, existing BIQA methods often fail to consider multi-scale distortion patterns and image content, and little research has been done on learning strategies to make the regression model produce better performance. In this paper, we propose a simple yet effective Progressive Multi-Task Image Quality Assessment (PMT-IQA) model, which contains a multi-scale feature extraction module (MS) and a progressive multi-task learning module (PMT), to help the model learn complex distortion patterns and better optimize the regression issue to align with the law of human learning process from easy to hard. To verify the effectiveness of the proposed PMT-IQA model, we conduct experiments on four widely used public datasets, and the experimental results indicate that the performance of PMT-IQA is superior to the comparison approaches, and both MS and PMT modules improve the model's performance.
translated by 谷歌翻译